Capacity of a Burst-Noise Channel 
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(Manuscript received March 15, 1960) 

A model of a burst-noise binary channel uses a Markov chain with two 
states G and B. In state G, transmission is error-free. In state B, the chan- 
nel has only probability h of transmitting a digit correctly. For suitably 
small values of the probabilities, p, PoftheB-^G and G — > B transitions, 
the model simulates burst-noise channels. Probability formulas relate the 
parameters p, P, h to easily measured statistics and provide run distribu- 
tions for comparison with experimental measurements. The capacity C of 
the model channel exceeds the capacity C(sym. bin.) of a memoryless sym- 
metric binary channel with the same error probability. However, the differ- 
ence is slight for some values of h,p,P; then, time-division encoding schemes 
may be fairly efficient. 

I. INTRODUCTION 

In information theory the symmetric binary channel is the classical 
model of a noisy binary channel. This channel generates a sequence of 
binary noise digits z n , which it adds (modulo 2) to input digits x„ 
to produce output digits y n = x n + z n . The symmetric binary channel 
is memoryless; a sequence of independent trials produces the noise digits 
z n . Each trial has the same probability P(l) of producing an error and 
probability 1 — P(l) = P(0) of no error. The capacity C(sym. bin.) 
of this channel is well known (see Shannon ): 

C(sym. bin.) = 1 + P(0) log 2 P(0) + P(l) log 2 P(l). 

Channels with memory occur in practice. If radio static or switching 
transients produce the noise, the errors group into isolated bursts (sev- 
eral errors close together). Independent trials fail to simulate such a 
burst-noise. Section II of this paper presents a model of a burst-noise 
channel that is simple enough to permit calculation of the channel ca- 
pacity C (see Sections III and VI). Sections IV and V give run distribu- 
tions, the covariance function and other probability formulas as aids to 
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testing the model's applicability and to picking model parameters which 
match measured statistical data. 

Of all binary channels with a given error probability P(l), the sym- 
metric binary channel has least capacity. Indeed, if an encoding for 
signaling over the symmetric binary channel at a rate R is known, then 
N sources can use this encoding in time-division multiplex at rates R/N, 
each over a burst-noise channel. Here, N must be large enough so that 
noise digits N apart are nearly independent. Time division protects 
against other noise patterns besides bursts; still less redundant schemes 
are possible. The possible increase in signaling rate C — C(sym. bin.) 
will be seen to be often surprisingly small (see Fig. 4). 

II. THE MODEL 

A Markov chain with two states can be used to generate bursts. The 
two states will be called G (for good) and B (for bad or for burst). In 
state G the noise digit is always z n — 0. In state B a coin is tossed to 
decide whether z„ will be or 1. 

The coin-tossing feature is included because actual bursts contain 
good digits interspersed with the errors. In the formulas that follow a 
biased coin is allowed (probability h of making no error in state B). 
All computations given here take h = 0.50, which seems a reasonable 
value. 

After producing the noise digit z n , the Markov chain makes a transi- 
tion to prepare for z ll+l . To simulate burst noise, the states B and G 
must tend to persist; i.e., the transition probabilities P = Prob(G — * B) 
and p = Prob(B — > G) will be small and the probabilities Q = 1 — P, 
q = 1 — p of remaining in G and B will be large. Fig. 1 is a transition 
diagram for the Markov chain. 

Runs of G will alternate with runs of B. The run lengths have geo- 
metric distributions with mean \/P for the G-runs and mean l/p for 




T 

Fig. 1 — Transition diagram for the Markov chain. 
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the B-runs. The geometric distribution of G-runs seems reasonable. If 
the various clicks, pops and crashes, which might cause errors on a real 
channel, are not related to one another, then the times between such 
events will have the geometric distribution (see Feller," Section XIII. 9). 
Only mathematical simplicity justifies the geometric distribution of 
B-runs; one might construct more accurate models. Section III men- 
tions one way of elaborating this one; however, complicated models may 
be useless withoul adequate statistical data to determine all the model 
parameters. Section V will illustrate some of the difficulties in determin- 
ing just the three parameters P, p and h. 

The following 500 digits form a typical sample of burst-noise with 
parameters P = 0.03, p = 0.25, h = 0.5, produced by using random 
numbers: 

62 1 10 ,7 10 46 1 10101 1 10" 1 10 15 10 42 10 28 1 10 90 10 37 

i io 5 iooio 35 ioi ioio 23 i io 4 io I8 io 15 i 101 i ioi ioi no 5 . 

The exponents are run lengths; i.e., " denotes a run of G2 consecutive 
zeros. As expected, long runs of good digits separate the bursts. 

The 500-digit sample illustrates the impossibility of reconstructing 
the sequence of states from the sequence of digits. In portions of some 
of the long runs of zeros, the Markov chain was in state B; this went 
unnoticed because the coin tosses produced only zeros. The sample 
also contains one burst. 110 4 1 in which a short sojourn into state G pro- 
duced three of the four zeros. 

The fraction of time spent in state B is P(B) = P/(p + P). Since 
errors occur only in state B, and then just with probability 1 — h, the 
error probability is 

p(l) = (l - h)P (B) = (1 - h) ~^Tp- (1' 

III. THE CAPACITY 

Let H denote the entropy of the sequence of noise digits ■ ■ ■ ,z^ ,z* ,• • ■ . 
For all inputs x to the burst-noise channel, the conditional entropy, 
Hj(ij), of the output y knowing the input x is the same: 

UMj) = H. 

A simple argument then shows that the capacity C of .the burst -noise 
channel is C = 1 — H (a monogram source with probabilities 0.5 for 
and 0.5 for 1 attains the rate C). 
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Shannon 1 (Section 7) gives a simple way of computing an entropy 
H from state probabilities [/'(G), P(B) here] and transition probabil- 
ities. McMillan 3 (Section 2.0) notes that this result tacitly assumes that 
the state sequence is reconstructible from the digit sequence. Since a 
reconstruction is impossible here, H has a more complicated formula. 

A definition of H is 

H = lim £ P(z 1} ■•-, z N )h{ Zl , ••-, z N ), (2) 

AT-*oo z,-=0,l 

with 

i 
&(*!,• •-,£«) = - J2 P(Zff + i\zi,- ■■,z N )\og 2 P(z N+l \z l ,---,z N ). (3) 

If Zi = 1 , the corresponding state is certainly B and 

P(z i+ i ,- • •,£»+/ 1 21 r -'fr-i ,1) = POv+i ,• • -fr+j I 1) (4) 

follows for all j ^ 1. Then, 

P^at+i | 2i ,- • -,2,_ l ,l,2 i+ i ,' • -,2at) = P(z N+i I l,2, + i ,• • -,Z N ) 

follows and also 

h(Zi , • • • ,Zi-\ ,l,Zi+i ,• • • ,zn) = h(l,zi + i ,- • • ,z N ). 

Thus, just the number of consecutive zeros at the end of the block 
(zi ,• • • ,zn) determine h(zi ,- • • ,z N ) completely. Each of the 2 h's in the 
sum (2) is one of the N + 1 numbers 

h(i)MiO),- ■•MlO k )r ■■Mio N - l ),h(0 N ) 

(again exponents denote run lengths). After using this simplification in 
(2), summing and letting N — > °°, the result is 

H = £P(10 K )/i(10 K ). (5) 

K=0 

The terms of (5) involve probabilities of runs of zeros. Section IV 
will give a formula for the conditional probability, u(K), of a run of 
K or more zeros following a one, that is, u(K) = P(0 | 1). The con- 
vention u(Q) = 1 will be adopted. Then, in (5), 

P(10 K ) = P(l)u(K) 

[(1) gives P(l)]. Also, (3), together with P(0 | 10*) = u(K + l)/u(K), 
provides an expression for /i(10 ) : 
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U(K + 1) , U(K + 1) 



h(lO K ) = ~ ' ' log 



u(K) bi u(K) 



(6) 



Using (6), the terms of (5) rearrange into 

C= 1+P(1) Zv(K)\og 2 v(K), (7) 

where tf(/v) = u(K) — u(K + 1). Section IV contains formulas for 
v(K). Although (7) seems simpler than (5) and (G), it converges slowly. 
In Section V the computation method uses a modification of (5) and 

(0). 

Note that v{K) = P(0 k l | 1). Another derivation of (7) proceeds 
by showing that the noise sequence consists of successive blocks of 
digits of the form 1,01,001,- • -,0 K 1,- • •, chosen independently, and with 
probability v(K) for the block K 1. Then - XI v(K) log 2 v(K) is the 
information per block and P(l) is the average number of blocks per 
digit. 

Equations (5), (6) and (7) apply to certain other channels. These 
formulas followed just from (4), which holds whenever the lengths of 
successive runs of zero are independent. Whenever such independence 
can be assumed, a more elaborate model might use v(0),v(l),v(2),- • •, 
directly as parameters. Then P(l) in (7) is 

P(D = [E (K+ l)v(K)]-\ 

As a check, the symmetric binary channel has v(K) = P(l)[P(0)] K 
and (7) sums to C'(sym. bin.). 

IV. PROBABILITIES 

Recurrent events theory (Feller, 2 Section XIII) provides some prob- 
abilities needed in Sections V and VI. 

4.1 Recurrence Times for State B 

Let f K denote the conditional probability, in state B, that the first 
return to B will happen at step K: 

f K = P(G K_1 B|B). 

Then f, = q, /, = pP and f K = pQ K ~ 2 P for K ^ 2. It is convenient to 
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make these probabilities the coefficients of a generating function F{t) 
of recurrence time probabilities: 

Fit) = E.M A ' = qt + T ^ t . (8) 

For example, the probability / K (m) that the mth return to B happens 
at step K has the generating function 

Zf K {n) t K = [F(t)] m . (9) 

K=l 

The probability of no return to B in k steps is pQ* -1 . Then the prob- 
ability s(K,m) of exactly m returns to B in K steps (but not necessarily 
a return on step K ) is 

s(K,m) = f K (m) + f/ K _, l '"VQ W . 

K=l 

The corresponding generating function is 

E s(K,m)t K = (l + Y^Qi) [F(t)r - ° 0) 

4.2. Recurrence Times for Ones 

Starting from a one (and hence from B), the next one must occur at 
a return to B, but not necessarily the first return. The probability that 
the next one occurs at the mth return to B and at step K is 

h'"-\\ - h)j K ""\ 
Then, recurrence time probabilities for ones are 

v(K - 1) = P(0 K -'l | 1) = E h'"-\l - h)f K {m) . 

Hl=l 

Equation (9) now provides the generating function V(t) = E v(K)l K : 

Likewise, the probability u(K) that no one appears in the next K 

steps is 

u(K) = E s(K,m)h m , 
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which has generating function 



By (8), 



ud) = i + (p-Q)\ . (12) 

(\ - Qt)[l - hF(t)} 



U(i ) = 1+ g- Q) ' , (13) 



where D(t) - 1 - (Q + ftg)t - fc(p - Q)* 2 . 
Factor the quadratic D(/): 

D(0 = (1 - Jt)(l - Lt), 

where 2.7 = Q + /«? + V(Q + %? + ±h(p - Q) and L is the same 
expression with negative square root. Now, (13) becomes 

U(t) = l - + (P ~ Q) * ^ J L 



L Vi 



./ - L \\ - Jt 1 - L// ' 

The coefficient of t K in the power series for U(t) is 

u(K) = (j + P-ov'-a + p-o)^. (14) 

To find a recurrence formula for u(K), write (13) as D(t)U(t) = 1 + 
(7 J — Q)l and equate coefficients of / : 

u(X) = (Q + hq)u(K - 1) + MP - QM# - 2) (15) 

for K = 2,3,- • • . Initial values are 

«(0) = 1, w(l) = p + % 

For calculating, (15) is more convenient than (14). 
Similar steps lead from (11) to 

v(K) = ^4 [<V + p - Q)J* - ( ff L + p - Q)L*]. (16) 

For A' = 2,3,- • •, v(K) also satisfies (15), but with initial values 
p(0) = (1 - h)q, r(l) = (1 - h)(pP + hq). 

4. a. Covariance 

The covariance function of this binary noise is just a joint probability 
r(K) = Prob(2o = 1, ~k = 1 ). A formula for the generating function 
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R(t) = Yl r(K)f is 

B(t) = P(l) {1 +tV(t) + [tV(t)] 2 + •••} 

P(D 
1 - *F(0 

P(1)D(0 



(1 -f)[l+ (p - Q)<1" 

The term P(l)[fF(<)]™ in the sum generates the probabilities of finding 
Zn = z K = 1, with exactly m — 1 of the digits Z\ ,■ ■ -,2jr_i equal to 1. 

An explicit formula for r(K) follows by expanding R(t) in a power 
series : 



r(0) = P(l), 

vn (17) 

K = 1,2,- ••. 



rw=p(D 2 ri+ p(9 " p)j 



p 



V. PARAMETER MATCHING 



The three parameters p, P, h are not directly observable, so methods 
of deducing them from statistical measurements must now be considered. 
We will express p, P, h as functions of three other easily estimated noise 
parameters. One suitable set of three parameters (involving only trigram 
statistics) is 

a = P(l), 6 = P(1|1), c = p (10 i^ +%!!) • 

Here, c is the conditional probability of finding the place between two 
ones filled by a one, and it has the expression 

(1 - hW 
q 2 + pP ' 

Solving for p, P, h in terms of a, b, c, 

1,2 

ac — o 



p = q = 



2ac - b(a + c) ' 



h=l- h -, (18) 

P = a P 



1 — h — a 
If h = 0.5 is assumed, then q = 2b and no c measurement is needed. 
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For illustration, the 500-digit sample in Section II contains thirty- 
eight l's, fifteen 11 's, seven 101 's, and three Ill's. Estimates of a, b, c 
are a = 38/500, b = 15/38, c = 3/10. With these estimates, (18) gives 
ridiculous parameters (p is negative). The trouble is that 500 digits 
provide too small a sample. In particular, the estimate c = 3/10, based 
on only 10 observations, is far from the correct value c = 0.49. If h = 
0.50 is assumed, the estimates become p = 0.21, P = 0.036 (compare 
with true values p = 0.25, P — 0.03). 

After finding p, P, and h, the results of Section IV suggest compari- 
sons between run measurements and the probabilities u(K) or r (K). 
Fig. 2 shows curves of some run probabilities P(10 K ) = P(l)u(K) (on 
a log scale) versus K. As shown by (14), these curves straighten out for 
large K with slopes determined by J. 




Fig. 2 — Typical run distributions, with h = %. 
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Data on runs of zero can provide another estimate of p, I', h. The 
fraction of runs of length K or more is an estimate of u(K). By (14), 
one expects to find constants J, L, A such that 

u(K) = AJ K + (1 - A)L K . (19) 

These constants are easily found by fitting a curve of the form (19) to 
the measured run distribution. First, A and J are chosen to give the 
correct behavior AJ K for large K. Afterward, L is chosen to improve 
the fit for small K. Expressions for p, P, h in terms of A, J, L are 

h = 

p = i-k ' 

p -A(J-L) + (l-J)(fc>). 

Fig. 3 shows run distributions for two different telephone circuits 
transmitting binary data. These were two of the thousands of circuits 
in a recent large-scale program of telephone circuit measurements (see 
Alexander, Gryb and Nast. 4 * Channel 1146 carried an exchange call; it 
used loaded cable and only local exchange switching facilities. Channel 
1296 was a toll channel longer than 500 miles; it used K-carrier, a radio 
path, and loaded cables at the ends. These channels were chosen as 
examples because they were two of the noisiest cases measured, and 
thus provided plenty of data. The step functions in Fig. 3 show the 
fractions of zero runs of lengths K or more from a sample of about 130 
consecutive zero runs for each channel. The smooth curves show the 
curves (19) that fit these distributions. In the case of channel 1140, 
u(K) = 0.9946 K provided a good fit; then channel 1140 was well ap- 
proximated by a symmetric binary channel with p = 0.9940. The results 
for channel 1290 look more like Fig. 2. The straight line asymptote is 
the function AJ K with parameters A = 0.184 and ./ = 0.99743 chosen 
to approximate the data for large K. The parameter value L = 0.81 
makes the curve (19) fit the data for small K. These values of A, J , L 
provide the estimates 

h = 0.84, P = 0.003, p = 0.034. 



* The curves appearing in Ref. 4 show only combined data from hundreds of 
channels. Since these channels differ greatly among themselves, the curves in 
Ref. 4 do not have the form (19). 
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0.5 



C=09/ 0.8 0.7 /06 Q 



C(SYM. BIN.) 
= 0.8 




Fig. 4 — Capacities C and C(sym. bin.) as functions of p,P, with h = £. 

The 500-digit sample of Section II provides a run distribution with 
more statistical fluctuations than in Fig. 3 because of the smaller sample 
size. The curve fitting yields A = 0.385, J = 0.961, L = 0.32 and h = 
0.432, P = 0.047, v = 0-232. 

VI. CAPACITY COMPUTATIONS 

By (14) and (16), u(K) and v(K) behave like multiples of J K for 
large K. In the most interesting cases P is small and J is nearly 1.0 
(J ^ Q always); then (7) converges slowly. However, 



u{K+ 1) 

u(K) 



J 



for large K and, by (6), 

h(10 K ) -* - J log 2 J - (1 - J) log 2 (1 - J) = fa. 

Here, h(l0 K ) approaches its limiting value h rapidly; indeed, L = 
Q + hq — J ^ hq. When h = 0.5, typical values of L are about 0.5 or 
less, and the L K term in (14) becomes negligible when K reaches 10 or 
15. Thus, the approximation /i(10 ) = fto is good for all K ^ Kq where 
7£o is only moderately large. The corresponding terms of the infinite 
series (5) sum to 

£ P(10 K )h = h P(l) Z u(K) 



K=K 



K=K 



[*0-X "1 

i - p(d £ »tto|. 
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The last step used the identity 

P(l)[u(0) + u(l) +u(2) + •••] = 1, 

which follows from (13) with t = 1. Then, the first K — 1 terms of 
(5), together with the correction just derived, suffice to compute C 
accurately. 

Fig. 4 shows contours of constant C and C(sym. bin.) versus p,P for 
h = 0.5. [C(sym. bin.) was computed with P(l) given by (1)]. If the 
average burst length is not large (p not too small), the difference between 
the two capacities is slight. 

The author is indebted to Miss M. A. Lounsberry for the computa- 
tions shown in Figs. 2 and 4. 
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